第 3 课：ReAct 架构（Reason + Act）

在上一课的 Decision Engine 中，我们其实已经给出一套很好的容错代码。

如果你仔细观察会发现里面存在两个循环：

第一个循环是处理 提问得到的回答和要求不符/存在错误，

当前策略是把当前错误信息添加到历史记录中，并重新提问。超过最大尝试次数后终止提问，并给出保守回答。

在函数 decide_with_retry 中：

for attempt in range(max_retries + 1):  # 在有限的循环次数内，添加错误信息并尝试

        # 如果上次发送得到的回答 经过检查存在错误，本次对话中加上上次的报错信息。
        if err_msg:
            messages.append({
                "role": "system",
                "content": f"Your previous output failed validation: {err_msg}. "
                           f"Output ONLY one valid JSON object that matches the schema."
            })

        # 获取回答
        raw = call_llm(messages) 

        # 错误类型一：回答内容str 不能严格按照 json 格式
        try:
            obj = json.loads(raw) # 把 str 转化成 dict
        except Exception as e:
            err_msg = f"Invalid JSON parse error: {type(e).__name__}"
            continue

        # 错误类型二：回答内容存在逻辑错误，使用之前定义的函数检测
        ok, reason = validate_decision(obj, tools_available)
        if ok: # 如果成功，则返回回答的结果 dict。包含action，tool，tool_input， final
            return obj
        else: # 如果失败，添加错误原因到对话中
            err_msg = reason

        # 指数退避（简单版）
        time.sleep(0.4 * (2 ** attempt))

第二个循环是处理 工具调用失败 的问题，但是其实没有用 for 循环写出来

当前策略是把工具调用的错误信息写入 工具的调用结果 Observation 中后，在函数 decide_with_retry 中添加到聊天记录中，然后重新提问一次（没有循环尝试）。相当于只允许一次工具调用失败，其实也可以改成最大允许失败次数。


if decision1["action"] == "tool_call":  # 如果第一次得到的回答是”要用工具
    tool = decision1["tool"]            # 获取工具的名字
    tool_input = decision1["tool_input"]# 获取工具需要的输入

    # 获取工具执行的 返回（也就是 obeservation）
    try:
        if tool == "search":
            query = tool_input.get("query", "")
            obs = TOOLS["search"](query)
        else:
            raise RuntimeError("Unknown tool")
    except Exception as e:
        obs = f"[TOOL_ERROR] {type(e).__name__}: {e}"


    # 第二步：把 Observation 注入，要求模型基于 observation 输出 final
    decision2 = decide_with_retry(
        state={"goal": goal, "note": "Use the observation to answer."}, #注明使用 observation 回答
        tools_available=tools_available,
        last_observation=obs
    )
    print("Decision2:", json.dumps(decision2, ensure_ascii=False))
else:
    # 不需要工具，直接 final
    print("Final:", decision1["final"])

（一）ReAct 结构

这种检查容错机制其实就是本节课要讲的 ReAct 架构，能够处理模型回答和工具调用的失败，不会死循环。

ReAct = Reason（推理） + Act（行动） + Observation（观察）

也就是说：

每一步让 LLM 按照模板给出决策（Reason），并检查
按照 LLM 的决策调用工具（Act），获取反馈（Observation）
把 Observation 放入下一轮对话的记忆中。直到 finish

[State] → LLM(Decision) → [Action JSON]
                     ↓
               Validator / Guardrails
                     ↓
             Tool Executor / Error Handler
                     ↓
                 Observation
                     ↓
                 Update State
                     ↺

（二）容错机制

1. 模型返回检查（Validator）

raw = call_llm(...)
obj = json.loads(raw)
validate_decision(obj)

Validator 至少做三层校验：

结构层：字段是否齐全
类型层：isinstance(...)
语义层：
- action 是否允许
- tool 是否存在
- action 与字段是否一致

2. 工具调用检查

工具失败 ≠ Agent 崩溃工具失败 = 新的 Observation

try:
    obs = tool(...)
except Exception as e:
    obs = f"[TOOL_ERROR] {type(e).__name__}: {e}"

然后：

把这个 obs 作为 Observation 喂回模型
让模型决定：
- 换工具
- replan
- ask_user
- finish（降级）

千万不要做的事 ❌

工具异常直接 raise
不告诉模型失败原因
悄悄重试无限次

3. 如何避免 Agent 无限循环

你至少要做 4 层防护：

max_steps（硬上限）

for step in range(max_steps):

动作约束（不能乱来）
- 不允许 tool_call 无限重复
- 连续 replan 次数限制
终止状态变化检测

if state == last_state:
    force_replan()

指数退避（你刚学过）

（三）思维链被截断的RePlan

有时候会出现这种典型表现：

连续 tool_call 但结果无用
重复同一个 action
输出越来越短 / 空
Validator 连续失败

因此 ReAct 必须具备的“自救能力”。（Re-evaluate / Replan）

策略 1：显式 replan 动作

你已经允许了：

{"action": "replan"}

在 replan 时：

清空 observation

更新 state：

state["note"] = "Previous approach failed. Try a new plan."

策略 2：强制 meta 提示

在 system 里加一条：

If you are stuck or repeating actions, choose "replan".

策略 3：外部强制切断

if repeated_actions > 2:
    return "Unable to proceed. Please clarify the goal."

（四）工具 schema

如果你仔细观察，会发现

现在这份代码里： 👉 并没有任何地方“规定” tool_search 的输入参数必须叫 query 👉 模型之所以输出了 {"query": ...}，完全是“猜的 / 习惯性的 / 概率性行为” 👉 这在工程上是 ❌ 不安全、❌ 不可控、❌ 迟早出 bug 的

因此需要明确声明工具 schema（给模型看的）

1. 提示词模板

TOOL_SCHEMAS = {
    "search": {
        "description": "Search the web for information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search keywords"
                }
            },
            "required": ["query"]
        }
    }
}

把 schema 注入到 prompt（关键）

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "system", "content": f"Tool schemas:\n{json.dumps(TOOL_SCHEMAS)}"},
    {"role": "user", "content": json.dumps({"state": state})}
]

这样模型才知道：

search 这个工具
必须提供 query
query 是 string
少了就不合法

2. Validator 同步升级

if action == "tool_call":
    schema = TOOL_SCHEMAS[tool]["parameters"]
    required = schema["required"]
    for k in required:
        if k not in obj["tool_input"]:
            return False, f"Missing required tool_input field: {k}"

3. 完整代码

Guardrails：动作空间 + 工具白名单 + 工具 schema

ALLOWED_ACTIONS = {"tool_call", "finish", "replan", "ask_user"}

TOOL_SCHEMAS = {
    "search": {
        "description": "Search the web for information. Use it when you need external facts.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search keywords"},
            },
            "required": ["query"],
            "additionalProperties": False
        }
    }
}

ALLOWED_TOOLS = set(TOOL_SCHEMAS.keys())

工具实现（先用“假搜索”演示；你可替换为真实 web search）

def tool_search(query: str) -> str:
    # Demo：这里返回固定内容，用于演示 Observation 注入
    # 你可以替换成真实搜索：SerpAPI / 自建爬虫 / 你的 web.run 等
    return f"[SEARCH_RESULT] query={query}\n- Agent = a system that uses an LLM to decide actions, can use tools, and updates state using observations."

TOOLS = {"search": tool_search}

System Prompt：强约束

SYSTEM_PROMPT = f"""
You are an agent decision engine.

You MUST output exactly one JSON object. No markdown, no extra text.

Allowed actions: {sorted(ALLOWED_ACTIONS)}.
Allowed tools (only if action is tool_call): {sorted(ALLOWED_TOOLS)}.

Decision JSON Schema:
{{
  "action": "tool_call|finish|ask_user|replan",
  "tool": "string|null",
  "tool_input": "object|null",
  "final": "string|null"
}}

Rules (anti-hallucination):
- You MUST NOT fabricate any external facts.
- Tool results can ONLY come from an Observation provided by the system.
- If you need external info, choose action="tool_call" and specify tool/tool_input.
- When action="tool_call": final MUST be null.
- When action="finish": tool and tool_input MUST be null.
- If you are stuck or repeating, choose action="replan" or "ask_user".

Tool Schemas:
{json.dumps(TOOL_SCHEMAS, ensure_ascii=False)}
""".strip()

LLM 调用

def call_llm(messages, model="gpt-4o", max_tokens=280, temperature=0.2) -> str:
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {
        "model": model,
        "messages": messages,
        "max_tokens": max_tokens,
        "temperature": temperature,
    }
    r = requests.post(CHAT_URL, headers=headers, json=payload, timeout=30)
    r.raise_for_status()
    return r.json()["choices"][0]["message"]["content"]

Validators：严格校验 decision（结构 + 类型 + 语义 + tool_input schema）

def _validate_tool_input_schema(tool: str, tool_input: Dict[str, Any]) -> Tuple[bool, str]:
    schema = TOOL_SCHEMAS[tool]["parameters"]
    required = schema.get("required", [])
    props = schema.get("properties", {})
    additional = schema.get("additionalProperties", True)

    # required fields
    for k in required:
        if k not in tool_input:
            return False, f"Missing required tool_input field: {k}"

    # type checks (minimal)
    for k, v in tool_input.items():
        if k not in props:
            if additional is False:
                return False, f"Unexpected tool_input field: {k}"
            continue
        expected_type = props[k]["type"]
        if expected_type == "string" and not isinstance(v, str):
            return False, f"tool_input.{k} must be string"
        if expected_type == "object" and not isinstance(v, dict):
            return False, f"tool_input.{k} must be object"

    return True, "ok"


def validate_decision(obj: Dict[str, Any], tools_available: set) -> Tuple[bool, str]:
    # 必要字段
    for k in ("action", "tool", "tool_input", "final"):
        if k not in obj:
            return False, f"Missing key: {k}"

    # action 合法
    action = obj["action"]
    if action not in ALLOWED_ACTIONS:
        return False, f"Invalid action: {action}"

    # tool_call 语义一致性
    if action == "tool_call":
        tool = obj["tool"]
        if not isinstance(tool, str):
            return False, f"tool must be string when action=tool_call, got: {tool}"
        if tool not in tools_available:
            return False, f"Tool not allowed/available: {tool}"

        if not isinstance(obj["tool_input"], dict):
            return False, "tool_input must be an object when action=tool_call"

        ok, reason = _validate_tool_input_schema(tool, obj["tool_input"])
        if not ok:
            return False, reason

        if obj["final"] is not None:
            return False, "final must be null when action=tool_call"

    else:
        # 非 tool_call：tool/tool_input 必须为 null；final 必须为 str（ask_user/replan/finish 都给人类可读文本）
        if obj["tool"] is not None or obj["tool_input"] is not None:
            return False, "tool and tool_input must be null when action is not tool_call"
        if not isinstance(obj["final"], str):
            return False, "final must be a string when action is not tool_call"

    return True, "ok"

decide_with_retry：解析失败/验证失败自动纠错重试

def decide_with_retry(
    state: Dict[str, Any],
    tools_available: set,
    last_observation: Optional[str] = None,
    max_retries: int = 2,
    model: str = "gpt-4o",
) -> Dict[str, Any]:

    base_messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": json.dumps({"state": state, "tools_available": sorted(tools_available)}, ensure_ascii=False)},
    ]

    if last_observation is not None:
        # 关键：Observation 用 system 注入，告诉模型这是唯一可信外部事实来源
        base_messages.append({"role": "system", "content": f"Observation:\n{last_observation}"})

    err_msg = None
    for attempt in range(max_retries + 1):
        messages = list(base_messages)
        if err_msg:
            messages.append({
                "role": "system",
                "content": f"Your previous output failed validation: {err_msg}. "
                           f"Output ONLY one valid JSON object that matches the schema."
            })

        raw = call_llm(messages, model=model, temperature=0.2, max_tokens=280)

        try:
            obj = json.loads(raw)
        except Exception as e:
            err_msg = f"Invalid JSON parse error: {type(e).__name__}"
            time.sleep(min(2.0, 0.4 * (2 ** attempt)) + random.random() * 0.1)
            continue

        ok, reason = validate_decision(obj, tools_available)
        if ok:
            return obj

        err_msg = reason
        time.sleep(min(2.0, 0.4 * (2 ** attempt)) + random.random() * 0.1)

    # 降级
    return {
        "action": "ask_user",
        "tool": None,
        "tool_input": None,
        "final": "I couldn't produce a valid tool/action plan. Please clarify your goal and constraints."
    }

Agent Loop：模型自行决定调用工具次数与停止

def run_cot_tool_agent(
    goal: str,
    max_steps: int = 6,
    model: str = "gpt-4o",
) -> str:
    """
    关键点：
    - 每一轮：LLM 输出 decision JSON
    - 若 tool_call：执行工具，得到 Observation，再喂回 LLM
    - 若 finish：返回 final
    - 若 replan：更新 state/重置 observation，继续
    - 若 ask_user：直接返回 final
    """
    tools_available = set(TOOLS.keys())
    state = {"goal": goal}
    observation = None

    for step in range(max_steps):
        decision = decide_with_retry(
            state=state,
            tools_available=tools_available,
            last_observation=observation,
            max_retries=2,
            model=model,
        )

        action = decision["action"]

        if action == "finish":
            return decision["final"]

        if action == "ask_user":
            return decision["final"]

        if action == "replan":
            # 最简单 replan：在 state 上写一个 note，告诉模型换策略
            state["note"] = "Replan: change approach. If external facts needed, call a tool."
            observation = None
            continue

        if action == "tool_call":
            tool = decision["tool"]
            tool_input = decision["tool_input"]

            # 执行工具：把异常作为 Observation 返回给模型（不是 raise）
            try:
                obs = TOOLS[tool](**tool_input)
            except Exception as e:
                obs = f"[TOOL_ERROR] {type(e).__name__}: {e}"

            observation = obs
            continue

    return "Failed: exceeded max_steps. Please refine the goal."

if __name__ == "__main__":
    goal = "Explain what an agent is. If external facts are needed, use the search tool. Keep it concise."
    answer = run_cot_tool_agent(goal, max_steps=6, model="gpt-4o")
    print(answer)

（一）ReAct 结构​

（二）容错机制​

1. 模型返回检查（Validator）​

2. 工具调用检查​

3. 如何避免 Agent 无限循环​

（三）思维链被截断的RePlan​

策略 1：显式 replan 动作​

策略 2：强制 meta 提示​

策略 3：外部强制切断​

（四）工具 schema​

1. 提示词模板​

2. Validator 同步升级​

3. 完整代码​